Search CORE

196 research outputs found

Recommended from our members

Uncovering Features in Behaviorally Similar Programs

Author: Su Fang-Hsiang
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2018
Field of study

The detection of similar code can support many so ware engineering tasks such as program understanding and program classification. Many excellent approaches have been proposed to detect programs having similar syntactic features. However, these approaches are unable to identify programs dynamically or statistically close to each other, which we call behaviorally similar programs. We believe the detection of behaviorally similar programs can enhance or even automate the tasks relevant to program classification. In this thesis, we will discuss our current approaches to identify programs having similar behavioral features in multiple perspectives. We first discuss how to detect programs having similar functionality. While the definition of a program’s functionality is undecidable, we use inputs and outputs (I/Os) of programs as the proxy of their functionality. We then use I/Os of programs as a behavioral feature to detect which programs are functionally similar: two programs are functionally similar if they share similar inputs and outputs. This approach has been studied and developed in the C language to detect functionally equivalent programs having equivalent I/Os. Nevertheless, some natural problems in Object Oriented languages, such as input generation and comparisons between application-specific data types, hinder the development of this approach. We propose a new technique, in-vivo detection, which uses existing and meaningful inputs to drive applications systematically and then applies a novel similarity model considering both inputs and outputs of programs, to detect functionally similar programs. We develop the tool, HitoshiIO, based on our in-vivo detection. In the subjects that we study, HitoshiIO correctly detect 68.4% of functionally similar programs, where its false positive rate is only 16.6%. In addition to functional I/Os of programs, we attempt to discover programs having similar execution behavior. Again, the execution behavior of a program can be undecidable, so we use instructions executed at run-time as a behavioral feature of a program. We create DyCLINK, which observes program executions and encodes them in dynamic instruction graphs. A vertex in a dynamic instruction graph is an instruction and an edge is a type of dependency between two instructions. The problem to detect which programs have similar executions can then be reduced to a problem of solving inexact graph isomorphism. We propose a link analysis based algorithm, LinkSub, which vectorizes each dynamic instruction graph by the importance of every instruction, to solve this graph isomorphism problem efficiently. In a K Nearest Neighbor (KNN) based program classification experiment, DyCLINK achieves 90 + % precision. Because HitoshiIO and DyCLINK both rely on dynamic analysis to expose program behavior, they have better capability to locate and search for behaviorally similar programs than traditional static analysis tools. However, they suffer from some common problems of dynamic analysis, such as input generation and run-time overhead. These problems may make our approaches challenging to scale. Thus, we create the system, Macneto, which integrates static analysis with machine topic modeling and deep learning to approximate program behaviors from their binaries without truly executing programs. In our deobfuscation experiments considering two commercial obfuscators that alter lexical information and syntax in programs, Macneto achieves 90 + % precision, where the groundtruth is that the behavior of a program before and after obfuscation should be the same. In this thesis, we offer a more extensive view of similar programs than the traditional definitions. While the traditional definitions of similar programs mostly use static features, such as syntax and lexical information, we propose to leverage the power of dynamic analysis and machine learning models to trace/collect behavioral features of pro- grams. These behavioral features of programs can then apply to detect behaviorally similar programs. We believe the techniques we invented in this thesis to detect behaviorally similar programs can improve the development of software engineering and security applications, such as code search and deobfuscation

Columbia University Academic Commons

Recommended from our members

Identifying Functionally Similar Code in Complex Codebases

Author: Bell Jonathan
Kaiser Gail E.
Sethumadhavan Simha
Su Fang-Hsiang
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2016
Field of study

Identifying similar code in software systems can assist many software engineering tasks, including program understanding. While most approaches focus on identifying code that looks alike, some researchers propose to detect instead code that functions alike, which are known as functional clones. However, previous work has raised the technical challenges to detect these functional clones in object oriented languages such as Java. We propose a novel technique, In-Vivo Clone Detection, a language-agnostic technique that detects functional clones in arbitrary programs by observing and mining inputs and outputs. We implemented this technique targeting programs that run on the JVM, creating HitoshiIO (available freely on GitHub), a tool to detect functional code clones. Our experimental results show that it is powerful in detecting these functional clones, finding 185 methods that are functionally similar across a corpus of 118 projects, even when there are only very few inputs available. In a random sample of the detected clones, HitoshiIO achieves 68+% true positive rate, while the false positive rate is only 15%

Columbia University Academic Commons

Recommended from our members

Metamorphic Runtime Checking of Applications without Test Oracles

Author: Bell Jonathan Schaffer
Kaiser Gail E.
Murphy Christian
Su Fang-Hsiang
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2013
Field of study

Challenges arise in testing applications that do not have test oracles, i.e., for which it is impossible or impractical to know what the correct output should be for general input. Metamorphic testing, introduced by Chen et al., has been shown to be a simple yet effective technique in testing these types of applications: test inputs are transformed in such a way that it is possible to predict the expected change to the output, and if the output resulting from this transformation is not as expected, then a fault must exist. Here, we improve upon previous work by presenting a new technique called Metamorphic Runtime Checking, which automatically conducts metamorphic testing of both the entire application and individual functions during a program's execution. This new approach improves the scope, scale, and sensitivity of metamorphic testing by allowing for the identification of more properties and execution of more tests, and increasing the likelihood of detecting faults not be found by application-level properties alone. We also discuss a technique for automatically discovering functions' metamorphic properties, and present the results of new studies that demonstrate that Metamorphic Runtime Checking advances the state of the art in testing applications without oracles

Columbia University Academic Commons

Recommended from our members

Code Relatives: Detecting Similar Software Behavior

Author: Harvey Kenneth
Jebara Tony
Kaiser Gail E.
Sethumadhavan Simha
Su Fang-Hsiang
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2015
Field of study

Detecting "similar code" is fundamental to many software engineering tasks. Current tools can help detect code with statically similar syntactic features (code clones). Unfortunately, some code fragments that behave alike without similar syntax may be missed. In this paper, we propose the term "code relatives" to refer to code with dynamically similar execution features. Code relatives can be used for such tasks as implementation-agnostic code search and classification of code with similar behavior for human understanding, which code clone detection cannot achieve. To detect code relatives, we present DyCLINK, which constructs an approximate runtime representation of code using a dynamic instruction graph. With our link analysis based subgraph matching algorithm, DyCLINK detects fine-grained code relatives efficiently. In our experiments, DyCLINK analyzed 290+ million prospective subgraph matches. The results show that DyCLINK detects not only code relatives, but also code clones that the state-of-the-art system is unable to identify. In a code classification problem, DyCLINK achieved 96% precision on average compared with the competitor's 61%

Columbia University Academic Commons

Recommended from our members

Dynamic Inference of Likely Metamorphic Properties to Support Differential Testing

Author: Bell Jonathan Schaffer
Kaiser Gail E.
Murphy Christian
Su Fang-Hsiang
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2015
Field of study

Metamorphic testing is an advanced technique to test programs without a true test oracle such as machine learning applications. Because these programs have no general oracle to identify their correctness, traditional testing techniques such as unit testing may not be helpful for developers to detect potential bugs. This paper presents a novel system, Kabu, which can dynamically infer properties of methods' states in programs that describe the characteristics of a method before and after transforming its input. These Metamorphic Properties (MPs) are pivotal to detecting potential bugs in programs without test oracles, but most previous work relies solely on human effort to identify them and only considers MPs between input parameters and output result (return value) of a program or method. This paper also proposes a testing concept, Metamorphic Differential Testing (MDT). By detecting different sets of MPs between different versions for the same method, Kabu reports potential bugs for human review. We have performed a preliminary evaluation of Kabu by comparing the MPs detected by humans with the MPs detected by Kabu. Our preliminary results are promising: Kabu can find more MPs than human developers, and MDT is effective at detecting function changes in methods

Columbia University Academic Commons

TPMD: a database and resources of microsatellite marker genotyped in Taiwanese populations

Author: Chang Ya-Hui
Chen Chia-Hsiang
Jou Yuh-Shan
Lee Tso-Ching
Pan Wen-Harn
Su Wen-Hui
Sun Hsiao-Fang Sunny
Tsai Shih-Feng
Publication venue: Oxford University Press
Publication date: 17/12/2004
Field of study

Taiwan Polymorphic Marker Database (TPMD) (http://tpmd.nhri.org.tw/) is a marker database designed to provide experimental details and useful marker information allelotyped in Taiwanese populations accompanied by resources and technical supports. The current version deposited more than 372 000 allelotyping data from 1425 frequently used and fluorescent-labeled microsatellite markers with variation types of dinucleotide, trinucleotide and tetranucleotide. TPMD contains text and map displays with searchable and retrievable options for marker names, chromosomal location in various human genome maps and marker heterozygosity in populations of Taiwanese, Japanese and Caucasian. The integration of marker information in map display is useful for the selection of high heterozygosity and commonly used microsatellite markers to refine mapping of diseases locus followed by identification of disease gene by positional candidate cloning. In addition, our results indicated that the number of markers with heterozygosity over 0.7 in Asian populations is lower than that in Caucasian. To increase accuracy and facilitate genetic studies using microsatellite markers, we also list markers with genotyping difficulty due to ambiguity of allele calling and recommend an optimal set of microsatellite markers for genotyping in Taiwanese, and possible extension of genotyping in other Mongoloid populations

CiteSeerX

Crossref

National Health Research Institues

PubMed Central

SLACC: Simion-based Language Agnostic Code Clones

Author: Beit-Aharon Jonathan
Bowles Stephen W
Elva Rochelle
Kessel Marcus
Khan Mohd Ehmer
Li Zhenmin
Su Fang-Hsiang
Walenstein Andrew
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 07/02/2020
Field of study

Successful cross-language clone detection could enable researchers and developers to create robust language migration tools, facilitate learning additional programming languages once one is mastered, and promote reuse of code snippets over a broader codebase. However, identifying cross-language clones presents special challenges to the clone detection problem. A lack of common underlying representation between arbitrary languages means detecting clones requires one of the following solutions: 1) a static analysis framework replicated across each targeted language with annotations matching language features across all languages, or 2) a dynamic analysis framework that detects clones based on runtime behavior. In this work, we demonstrate the feasibility of the latter solution, a dynamic analysis approach called SLACC for cross-language clone detection. Like prior clone detection techniques, we use input/output behavior to match clones, though we overcome limitations of prior work by amplifying the number of inputs and covering more data types; and as a result, achieve better clusters than prior attempts. Since clusters are generated based on input/output behavior, SLACC supports cross-language clone detection. As an added challenge, we target a static typed language, Java, and a dynamic typed language, Python. Compared to HitoshiIO, a recent clone detection tool for Java, SLACC retrieves 6 times as many clusters and has higher precision (86.7% vs. 30.7%). This is the first work to perform clone detection for dynamic typed languages (precision = 87.3%) and the first to perform clone detection across languages that lack a common underlying representation (precision = 94.1%). It provides a first step towards the larger goal of scalable language migration tools.Comment: 11 Pages, 3 Figures, Accepted at ICSE 2020 technical trac

arXiv.org e-Print Archive

Crossref

Long-term results of intensity-modulated radiotherapy concomitant with chemotherapy for hypopharyngeal carcinoma aimed at laryngeal preservation

Author: Chou Ying-Hsiang
Hsin Chung-Han
Lee Huei
Lee Jong-Kang
Liu Jung-Tung
Liu Wen-Shan
Su Mao-Chang
Tseng Hsien-Chun
Tseng Szu-Wen
Wang Tzu-Hwei
Wu Ming-Fang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background The objective of this retrospective study is to investigate laryngeal preservation and long-term treatment results in hypopharyngeal carcinoma treated with intensity-modulated radiotherapy (IMRT) combined with chemotherapy. Methods Twenty-seven patients with hypopharyngeal carcinoma (stage II-IV) were enrolled and underwent concurrent chemoradiotherapy. The chemotherapy regimens were monthly cisplatin and 5-fluorouracil for six patients and weekly cisplatin for 19 patients. All patients were treated with IMRT with simultaneous integrated boost technique. Acute and late toxicities were recorded based on CTCAE 3.0 (Common Terminology Criteria for Adverse Events). Results The median follow-up time for survivors was 53.0 months (range 36-82 months). The initial complete response rate was 85.2%, with a laryngeal preservation rate of 63.0%. The 5-year functional laryngeal, local-regional control, disease-free and overall survival rates were 59.7%, 63.3%, 51.0% and 34.8%, respectively. The most common greater than or equal to grade 3 acute and late effects were dysphagia (63.0%, 17 of 27 patients) and laryngeal stricture (18.5%, 5 of 27 patients), respectively. Patients belonging to the high risk group showed significantly higher risk of tracheostomy compared to the low risk group (p = 0.014). Conclusions After long-term follow-up, our results confirmed that patients with hypopharyngeal carcinoma treated with IMRT concurrent with platinum-based chemotherapy attain high functional laryngeal and local-regional control survival rates. However, the late effect of laryngeal stricture remains a problem, particularly for high risk group patients.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Indoor CO2 monitoring in a surgical intensive care unit under visitation restrictions during the COVID-19 pandemic

Author: Chao-Han Lai
Chao-Han Lai
Chao-Han Lai
Chao-Tung Yang
Chao-Tung Yang
Hsiang-Ching Chang
Hsiang-Ching Chang
Pei-Fang Su
Yen Ta Huang
Yi-Chia Liu
Ying-An Chou
Zheng-Yao Wang
Zheng-Yao Wang
Publication venue: 'Frontiers Media SA'
Publication date: 01/07/2023
Field of study

BackgroundIndoor CO2 concentration is an important metric of indoor air quality (IAQ). The dynamic temporal pattern of CO2 levels in intensive care units (ICUs), where healthcare providers experience high cognitive load and occupant numbers are frequently changing, has not been comprehensively characterized.ObjectiveWe attempted to describe the dynamic change in CO2 levels in the ICU using an Internet of Things-based (IoT-based) monitoring system. Specifically, given that the COVID-19 pandemic makes hospital visitation restrictions necessary worldwide, this study aimed to appraise the impact of visitation restrictions on CO2 levels in the ICU.MethodsSince February 2020, an IoT-based intelligent indoor environment monitoring system has been implemented in a 24-bed university hospital ICU, which is symmetrically divided into areas A and B. One sensor was placed at the workstation of each area for continuous monitoring. The data of CO2 and other pollutants (e.g., PM2.5) measured under standard and restricted visitation policies during the COVID-19 pandemic were retrieved for analysis. Additionally, the CO2 levels were compared between workdays and non-working days and between areas A and B.ResultsThe median CO2 level (interquartile range [IQR]) was 616 (524–682) ppm, and only 979 (0.34%) data points obtained in area A during standard visitation were ≥ 1,000 ppm. The CO2 concentrations were significantly lower during restricted visitation (median [IQR]: 576 [556–596] ppm) than during standard visitation (628 [602–663] ppm; p < 0.001). The PM2.5 concentrations were significantly lower during restricted visitation (median [IQR]: 1 [0–1] μg/m3) than during standard visitation (2 [1–3] μg/m3; p < 0.001). The daily CO2 and PM2.5 levels were relatively low at night and elevated as the occupant number increased during clinical handover and visitation. The CO2 concentrations were significantly higher in area A (median [IQR]: 681 [653–712] ppm) than in area B (524 [504–547] ppm; p < 0.001). The CO2 concentrations were significantly lower on non-working days (median [IQR]: 606 [587–671] ppm) than on workdays (583 [573–600] ppm; p < 0.001).ConclusionOur study suggests that visitation restrictions during the COVID-19 pandemic may affect CO2 levels in the ICU. Implantation of the IoT-based IAQ sensing network system may facilitate the monitoring of indoor CO2 levels

Directory of Open Access Journals